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Abstract 

Targeting always the best achievable bit error rate (BER) performance in iterative receivers operating 
over multiple-input multiple-output (MIMO) channels may result in significant waste of resources, 
especially when the achievable BER is orders of magnitude better than the target performance (e.g., 
under good channel conditions and at high signal-to-noise ratio (SNR)). In contrast to the typical iterative 
schemes, a practical iterative decoding framework that approximates the soft-information exchange is 
proposed which allows reduced complexity sphere and channel decoding, adjustable to the transmission 
conditions and the required bit error rate. With the proposed approximate soft information exchange the 
performance of the exact soft information can still be reached with significant complexity gains. 

Index Terms 

MIMO systems, iterative methods, soft-input soft-output detection, sphere decoding 

I. Introduction 

Multiple-input, multiple-output (MIMO) transmission with iterative receiver processing of soft infor- 
mation has been proposed as a very efficient way to achieve near-capacity transmission at the cost of a 
highly increased computational effort lH], ||2]. These increased processing requirements prevent practical 
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implementations from meeting the theoretical performance limits due to the resulting increased energy 
consumption and the increased latency requirements. 

Although the best achievable performance is required for transmission over "unfavorable" transmission 
environments (i.e., ill-conditioned transmission channels and in the low SNR regime) and for increased 
performance (in terms of BER) it may be unnecessary for transmission over a good channel and/or reduced 
performance needs. For the ?io?i -iterative case and in order to avoid the highly complex optimal solution 
when it is not required, receivers supporting both optimal and suboptimal algorithmic solutions (e.g., 
zero-forcing, MMSE) have been proposed |I3]. However, such approaches impose an increased (silicon) 
area occupation and involve tedious selection processes in order to choose the appropriate algorithm 
from the set of the available ones. These selection processes typically demand performance prediction 
methods. Therefore, the applicability of such methods is restricted to those scenarios where performance 
prediction is available and where the selection strategies are computationally efficient (i.e., they can be 
performed with low processing overhead). Consequently, such approaches are not convenient for iterative 
systems where performance prediction (per iteration) is very difficult to be acquired, especially when 
sub-optimal algorithms are involved in the iterative process. 

The proposed scheme targets the avoidance of the unnecessary processing which would further increase 
the reliability of those bits which reach the required performance (i.e., meet the TER) at early iterations 
of the decoding process. This simplification is performed only when the convergence of the iterative 
process is not expected to be significantly affected (both in terms of convergence point and convergence 
rate). Instead of supporting several soft demapping algorithms the proposed iterative decoding framework 
employs a single, flexible soft demapper which is efficiently realized in terms of sphere decoding (SD) 
and it can adjust its complexity to the given channel scenario and to the required target bit error rate 
(TER) performance. Additional savings can be achieved at the channel decoder side since the proposed 
framework allows selective decoding. 

The proposed approach can be applied to any SD scheme. However, the SD of IIH is employed which 
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requires only minor modifications in order to be accommodated at the proposed scheme. Furthermore, 
this SD can ensure the (exact) max-log MAP performance when it is required. In order to adjust the 
SD's complexity, the scheme selectively updates only the log-likelihood ratio (LLR) values of the bits 
whose exact value is required to preserve the TER and the convergence properties of the iterative process. 
Additionally, performance-driven LLR clipping is employed to avoid the unnecessary processing which 
eventually, will result in BER performance better than the TER. The fundamental concept of reducing 
SD's complexity by bounding the LLR clipping value at the cost of reduced BER performance has 
been proposed and demonstrated via extensive simulations in lH, Q. However, the question of how to 
practically set the LLR clipping value in order to adjust (on-the-fly) the receiver's complexity to the 
TER requirements has not been addressed. In this context, several approaches of different efficiency 
are discussed for relating the LLR clipping value to the TER. The proposed approaches do not require 
the exact relationship between the LLR clipping value and the resulting performance. Therefore, they 
are generally applicable to any kind of transmission scenario (i.e., channel, SNR, coding scheme, etc.) 
without demanding extensive simulations and/or any tedious and computationally intensive performance- 
prediction methods which should account for all the performance-affecting parameters and all the possible 
transmission scenarios. 

The paper is organized as follows. In Section II the typical soft-input, soft-output (SISO) processing 
for transmission over MIMO channels is outlined and the basic observations which are explored by the 
proposed approach are made. In Section III the details of the modified, approximate, iterative processing 
are presented together with the required modifications to the SD and the channel decoder. Complexity 
issues are also discussed. Finally, in Section IV, the proposed approach is validated and the corresponding 
complexity gains are depicted via extensive simulations. 

II. Soft-Input, Soft-Output Receiver Processing for MIMO Systems 

Typically, as shown in Fig. 1, during the q-th iteration and over several MIMO channel utilizations, 
the soft-demapper module employs the corresponding a-priori L^'* soft information vector (nulled for 
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q = 0) and the related received vectors y in order to calculate the a-posteriori or intrinsic L^^' as well as 
the extrinsic L^-* = L^-* — L^-* soft information of the coded bits. The extrinsic soft information is then 
de-interleaved and fed to the SISO channel decoder as a-priori information L^^ in order to calculate 
channel decoder's a-posteriori soft information This can be used by an early-stopping module to 
evaluate the resulted error rate performance, to be discussed later in detail. Then, if the TER performance 
has been reached, the iterations stop and hard decisions will be made based on L^^ Otherwise, the 
decoder's extrinsic information is calculated (L^'* = L^^ — L^^) and after being interleaved it is fed to 
the soft-demapper module as a-priori L^"*"^^ information to be used during the next iteration. 

A. Soft Demapping in Terms of Sphere Decoding 

In MIMO transmission with Mt transmit and Mr > Mt receive antennas, at the n-th MIMO channel 
utilization, the interleaved coded bits are grouped into blocks Bt^u = 1, Mt and u = !,...,[/ with 
U being the number of channel utihzations per code block) in order to be mapped onto symbols st^u 
of a constellation set S of cardinality \S\. The bipolar k-\h bit resides in block ^[fc/iog^ sn,M and the 
blocks Bt^u are mapped onto the symbols st^u by a given mapping function (e.g.. Gray mapping). The 
corresponding received Mr x 1 vector y„ is, then, given by 



where is the Mr x Mt complex channel matrix and = [si,u,'S2,u! ■■■^smt,u] is the transmitted 
symbol vector. Then, Cb,i,„ is the 6-th bit of the i-th entry of and the term n„ is the noise vector, 
consisting of i.i.d., zero-mean, complex, Gaussian samples with variance 2(T^. 

As already noted, the role of the soft demapper is to calculate at every iteration the a-posteriori log- 
likelihood ratios (LLRs) for all the symbols residing in the frame to be decoded. Namely, it calculates 



Assuming that the corresponding bits are statistically independent (due to interleaving) and by employing 



= H„s„ -I- n, 



(1) 




(2) 
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the Bayes' theorem (2) can be expressed as 

/ 



Ld {cb,i,u) = In 



where 



p{yu\su,ilu) 



1 



and 5^/^ are the sub-sets of possible s„ symbol sequences having the 6-th bit value of their i-th entry 
equal to ±1, while P[su\ is the available a-priori information provided by the channel decoder at the 
previous iteration. However, in order to compute (3) exhaustive calculations over all possible symbols 
are required which leads to prohibitive computational complexity especially for large Mt values. This 
problem is typically tackled by employing the standard max-log approximation and by QR decomposition 
of the MIMO channel matrix HI. Then the problem transforms into an equivalent tree-search which can be 
efficiently solved by means of sphere decoding. In detail, the channel matrix H„ can be QR decomposed 
into H„ = QuR„, with Q„ a unitary Mr x Mt matrix and an Mt x Mt upper triangular matrix 
with elements and real-valued positive diagonal entries. Then, under the max-log approximation 

the LLR calculation problem can be trasformed to ||T], |@] 



In ^ p(y„|s„,H„)P[s„] 



\Yu H^jS^I 



(3) 



(4) 



^±1 



where /(s^) = /, 



channel \ 



LD{cb,i,u)^ min {/(s„)} - min {/(s„)} 



Sti) + Iprior{^u)t with 



(5) 



1 „ , „ „2 1 



Mt 



I channeli^^u) — 2(T^ ^ -f^tt^n|| — 9^2 ^ ^ 



2al 



1=1 



Mt 



being the channel-based part of the soft information, with y',^ = Qu^Yu 
while, for statistically independent symbols 



(6) 



Mt 



Iprior{su) = -lnP(s„) = - ^ In P [s/, J 



(V) 



1=1 
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being the a-priori part of the soft infomiation, which is always non-negative. Eq. (5) shows how the 
max-log LLR calculation can be reformulated into two constrained minimization problems per decoded 
bit over the different symbol-vector subsets (i.e., S^l^. 
Since P[c] is related to its corresponding LLR as 

and after extracting the mutually exclusive terms from the two minimization problems of (5), (7) becomes 

IpriariSu) = o ^'^^ ('^JV.")! ~ ^j, i,„-LA {cj,l,u)) (9) 

1=1 j=l 

without affecting the optimality of (5). The extrinsic information is then calculated by subtracting the 
a-priori from the a-posteriori information and, after de-interleaving, it is fed to the SISO chaimel decoder 
as a-priori information 

La (ck = T^"^ (c6,i,«)) = Le (cfe,i,«) = Ld {cb,i,u) - La (c6,j,u) 

with k = l, ...,K and K = [/Mrloga \S\. Then, 

LE{cb,i,u)^ min {7'(s„)} - min {/'(s„)} (10) 

with l'{Su) = IchanneliSu) + I' prior {^u), and 

l'prior{Su) = Z Z (1-^^ (Ci,i,«)l " Cj,l,uLA {Cj,l,u)) (H) 
1=1 j=l{j^b;l=i) 

By inspecting the equations above, some basic observations can be made which will later be exploited 
by the proposed approach. As it can be seen from (6) and (9), even if Ichannei significantly affects 
the solution of the minimization problem of (5), it does not vary over iterations in unUke to Iprior- In 
addition, the strongly unlikely bits (i.e., the ones of opposite sign to La with high \La\ which result in 
low P[c], see (8)) contribute with high Iprior values (see (9)). Thus, the symbol-vectors consisting of 
such bits can be assumed to be weak candidate solutions for the minimization problems of (5). On the 
other hand, the highly hkely solutions (of the same sign with La) contribute with zero Iprior- Therefore, 
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the symbol-vector solutions are expected to consist of the symbols having the smallest Ichannei among 
the ones with small Iprior values. Namely, among the symbols consisting of highly likely (i.e., of the 
same sign with La) and loosely unUkely bits (i.e., of low \La\ and opposite sign than La)- Since for 
high \La\ value the bit with the opposite sign to La is not expected to belong to the symbol-vector 
solutions, while the bit with the same sign to La contributes with zero Iprior independently of the \La\ 
value, it can be concluded that approximate calculation of the related strong (i.e., of high amplitude) 
a-priori information is not expected to significantly affect SD performance. 

An additional property related to the soft-information flow is that since Ichannei remains constant over 
iterations the value of Le (cb^i^u) will also be constant as long as the symbol- vector solutions of (5) 
do not vary over iterations and consist of (highly or loosely) likely bits contributing with zero Iprior, 
even if the a-priori information varies. This attribute is expected over later iterations when (and if) the 
corresponding symbol solutions are dominated by the highly likely symbols. Then, instead of recalculating 
their corresponding Lo and Le values the ones of the previous iteration can be employed. This property 
can be exploited by any SD approach to provide computational complexity gains. These gains are expected 
to increase with the number of iterations where the average reliability of the bits is expected to increase 
0. 



B. SISO Channel Decoding 

Similarly to the soft demapper, after de-interleaving, the corresponding input soft-information is ex 
ploited in order to calculate the corresponding a-posteriori information, as 

'P[ck = +1\La\ 



Ld (ck) = In 



(12) 



where for c being the encoded sequence after de-interleaving, it is expressed as 

/ \ / 



Ld (cfc) = In 

K 

In I exp^lnP qIL^C 



clL/ 



In 



clL/ 



K 



+1 i=l 



Ci 



In I exp^lnP qILaCq) 



(13) 
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with C^^ being the set of bit sequences c with their k-th bit equal to ±1. Then, (13) can be efficiently 
calculated by the well-known BCJR-MAP algorithm Q, HI. 

Similar observations with those holding for the soft-demapper can be made for the SISO outer channel 
decoder. In detail, (13) shows that the most significant contributing sequences c to the Ld calculation 

K r , 1 

are those with their non-positive J^^^P Cj|L^(cj) terms being close to zero, or equivalently, se- 

k=l L ^ 

quences which do not contain highly unlikely bits (of very low P Ci\LA (cj) ). Additionally, under the 
approximation of jOl 



\nP (ck\LA (ck)) ~ ^ (cfc-C-A (cfc) 



La (ck) 



(14) 



which holds for large \La (c/c) | values (typically larger than 2), it can be deduced that for (highly) likely 
bits the terms InP Ci|L^ (cj) equal zero independently of the exact La value. Therefore, similar to the 
SD, approximate calculation of the strong soft information, (i.e., of high \La {ck) [) is not expected to 
significantly affect the outcome of the SISO channel decoder. 

By using (13) we can express the extrinsic information of the outer SISO decoder as 

Le (ck) = Ld {ck) - La (ck) = 



K 



K 



In I ^ exp ^ InP Ci\LA{ci 



In I ^ exp ^ InP Ci\LA{ci 

^c:C,7' i=l,i^k 



(15) 



from which it becomes apparent that the extrinsic information is a function of the soft information of all 
other bits different than Ck- Practically, those bits which mainly affect the calculation of Ld (ck), here 
denoted as Ac^, are the ones residing in a region around Ck of a size related to the constraint length of 
the code ifTOl . Thus, a high \Le (ck) \ value denotes that the a-priori information of the surrounding Ag^ 
bits can introduce such high reliability to the bit that it finally becomes strongly reliable even if its own 
a-priori information is loose (since Ld {ck) = Le (ck) + La (ca,)). Therefore, it will later be assumed 
that |L£; (cfc) I is a good indicator of the correcting capabilities of the outer code on the specific bit. 

Since Le {ck) is a function of the "new" soft-information content calculated at the soft-demapper side 
(LA(Acfc) = L^; (7r(Acfc))), it is also assumed to be a good indicator of the convergence rate of the 
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corresponding bit. Therefore, in the sequel, when the extrinsic information of a bit is loose, even if its 
a-priori information is strong (which will lead to strong a-posteriori information), its soft information 
exchange will not be approximated in order to preserve its convergence rate. In other words, loose extrinsic 
information for Ck indicates loose information content for (at least some of) the bits belonging to Kc^. 
Therefore, since the decoding quality of the Ag^, bits is also affected by the soft information content 
of Ck (since Ck belongs to the theirs A region: Ck G A^^^ -j) approximating its soft information may 
affect the convergence of the A^^ bits. At this point the following, additional relation can be brought up. 
The overall convergence rate of the iterative process is dominated by the slowest converging bit. Thus, 
approximating the soft information of the fast converging bits (or equivalently the ones with the strong 
a-priori information to be fed to the following module) does not significantly affect convergence. In 
contrast, the convergence properties can be affected by approximating the soft information of the slowest 
converging bits (as we already discussed in Sections II.A and B). 

Finally, it is noted that when Ag^ have reached the state of constant information exchange (i.e., over 
later iterations) Le (ck) will be also constant. 

C. Performance Driven Early Iteration Stopping 

In practical iterative schemes frame-based, early-stopping mechanisms are typically employed to reduce 
the average required number of iterations without compromising the resulting performance. In such mech- 
anisms, the convergence status is checked by means of a specific pre-selected criterion for terminating 
iterations. Two main classes of criteria have been proposed for this purpose. The first class is based on 
cross-entropy metrics lITTI - llTSl and it is typically employed to identify if a frame still converges over 
iterations. The second class lITTI . |[T4l - |[T6l employs, directly, the calculated a-posteriori LLR values 
to evaluate the already achieved error rate performance, and then terminate iterations when the TER is 
achieved. Even if the cross-entropy methods have been shown to be more efficient in identifying if an 
iterative system is still converging, in this work the second approach is employed since it links the number 
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of iterations directly to the error-rate performance. Thus , it can early terminate the iteration process when 
the TER is reached, even if the iterative system still converges. 

According to ifTTl the BER of the decoded block can be evaluated after SISO decoding as 

1 ^' 1 

'''^^Sl + exp(|L,(c[)|) ^''^ 
where are the Nj information bits. From the above equation some simple conclusions can be drawn 
which will be exploited by the proposed approach. The provided BER, as well as the corresponding 



values. Thus, for reliable error 



estimate, are expected to be dominated by the bits with small Ld ^cf^ 
rate prediction and in order to preserve the convergence behavior of the iterative process, no approximation 
is attempted for the weak Ld ^cf^ values. In the same context, if all contributing terms in (16) with 
BER<TER (or equivalently with Lo (q) < Lter = ln{TER''^ — 1)) are accurately calculated while 
the others are clipped to a value not smaller than Lter no significant error rate performance degradation 
is expected for the SNR regimes of achievable performance lower than the TER. However, when clipping 
is applied, special consideration has to be taken so that the system's convergence is not affected. In this 
framework, several cUpping approaches have been considered (see Section III.B). It is significant to note 
that the proposed approximate information flow does not demand any early-stopping mechanism and it 
does not depend on the choice of the adopted criterion. However, it is considered in this work in order to 
make meaningful performance comparisons, since such mechanisms are anticipated in practical iterative 
systems able of adjusting their complexity to the transmission scenario (e.g., SNR) and the TER. 

III. Iterative Receiver Processing of Approximate Soft Information 

A. Approximate Soft Information Flow 

The proposed approximate soft information flow is depicted in Fig. 1 . The scheme adjusts its processing 
requirements to the TER performance by avoiding the unnecessary processing which would further 
increase the reliability of those bits which are already reliable enough (i.e., meet the TER) at early 
iterations, but only when such an approximation is not expected to significantly affect the convergence 
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properties of the iterative process. Since the extrinsic information has been discussed to be a good 
indicator of both the per-bit correcting capabilities of the channel code and of the per-bit convergence 
rate (Section II.B) it is employed to decide when the soft information of a specific bit can be safely 
approximated. In detail, the proposed approach consists of the following steps: 

1) Identification of Reliable and Well Converging (RWC) Bits. 

Using the SISO decoder output at the g-th iteration and after the stopping-control check, the reliable 
bits (i.e., those who meet the TER requirement) with high convergence rate are identified since, 
more likely, their exact soft-information calculation is not required. In order to characterize the bits 
their a-posteriori and extrinsic information is used as described in Section II.B. Consequently, a 
flag sequence is introduced to indicate the bits whose a-posteriori and extrinsic information (which 
is also the a-priori information for the soft demapper) is larger than Lter, with G^^^ {k) = 1 when 
both LE{ck) and Loick) are larger than Lter and G^^^ {k) = otherwise. 
When a bit is identified to be a RWC one, it is assumed (perhaps wrongly) that it has reached 
its constant information flow state. However, if during the next iteration it is not again identified 
to be an RWC bit, the initial (over the previous iteration) assumption was obviously wrong and 
it needs to be corrected. In detail, as it has already been discussed in Section II.B, if a bit has 
been wrongly assumed to be a RWC one (and therefore its soft-information wrongly has not been 
updated) a negative effect on the convergence characteristics of its neighbor bits is expected (which 
can also be RWC ones). This negative effect is typically reflected in their extrinsic and a-posteriori 
information (see Section II.B). In the same way, affecting the convergence rate of the neighboring 
bits will affect the soft information of the bit in question (which can become a non-RWC bit). 
Therefore, in order to remedy wrong RWC bit characterizations which could significantly affect 
the system's convergence, full RWC check over all bits takes place at each iteration. 

2) Reduced Processing Soft Demapping 

After the RWC bit identification, the flag sequence G^^^ {k) is interleaved and the position of the 
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RWC bits at the soft demapper side is identified, then, reduced complexity soft demapping can 
be performed. For this purpose a sUght modification of the SD in |4| is described in the next 
sub-section. There, the SD reduces its processing requirements by skipping the soft information 
calculation of the RWC bits and by approximately calculating (bounding) the soft-output values of 
those bits which result in extrinsic information larger than Lter- As it will be later discussed in 
detail, this is only performed when such an approximation is not expected to significantly affect 
the convergence behavior of the iterative system. 
3) SISO Decoding with Approximate Soft Information 

Finally, after calculating and de-interleaving the (approximate) extrinsic information of the SD, SISO 
outer decoding follows. For the RWC bits the soft information has not been updated. Therefore, the 
a-priori information of the previous iteration is employed (since constant flow has been assumed). 
Then, the processing proceeds with an early-stopping check and an RWC bit update (step 1). 
During the decoding process, no changes are expected on the status of an RWC bit if all its A bits 
(i.e., the neighboring bits related to its decoding, see II.B) are also RWC ones. On the contrary, 
changes may occur whenever in its A region lie non-RWC bits. Based on this observation, instead 
of performing full channel decoding, selective decoding can be performed only on the non-RWC 
bits and their corresponding A neighbors, resulting in additional complexity gains at the channel 
decoder side, and will be discussed in detail in Section III.C. In this context, a scenario-adaptive 
SISO channel decoder may perform selective decoding only on the non-RWC and their surrounding 
bits belonging to a window of length w centered on each non-RWC bit, so that the non-updated 
RWC bits do not have any non-RWC ones in their A region. 

As it will be shown by simulations, the proposed RWC identification is so reliable that no significant 
changes occur at the state of RCW bits over later iterations, especially for low TER values. Then, 
the potential gains at the SISO decoder side can be maximized by setting the w = 1. 
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B. Scenario-Adaptive SD 

The herein proposed scenario-adaptive SD is based on the typical, depth-first SD approach of ||4]. 
However, as discussed in Section I, the proposed approximate soft information flow is independent of 
the soft demapper realization approach. 

In detail, in (5) it is shown how the max-log LLR calculation can be reformulated into two constrained 
minimization problems over the different symbol- vector subsets (i.e., S^l^, per decoded bit. For each 
minimization problem the corresponding tree has its root at level I = Mt + 1 and its leafs at level 1 = 1. 
The / (s„) values for any leaf can be calculated recursively by 

D (s«) = D (s('+i)) + w + I^or (17) 

where s"u = s«+i,ti, ■■■,smt,uV partial symbols vectors, D {s'u ^'^^^^^ = 0, 



j(0 i^ii) 

channel \ u 



Mt 

j=i 



(18) 



and 

i=i 

with D (su^^ being the partial distance (PD) of the sl'^ node. Then / (s^) = D (^Su^^ 



Depth-first tree traversal with Schnorr-Euchner enumeration 111811 and radius reduction are assumed like 
in Q. The initial radius is set infinite and whenever a leaf is reached with its corresponding squared radius 
being smaller than D (sl^^) the is updated to D (su^\ At each visited node si'^ a constraint check 



takes place. If its D (sl'^^ > this node, its children, as well as its not yet visited siblings, are pruned. 
In addition, in order to avoid redundant calculations which are common to the different minimization 
problems (and tree searches) of (5), the single-tree-search approach of ||4l can be employed. According 
to this, only one tree search takes place but different r^. ^ values are used for any of the minimization 
problems of (5), with ^ being the squared radii related to the two minimization problems (i.e., for 
'^b^iu respectively) of Lo (ck) calculation. Whenever a new leaf is reached the ^ values of all tree 
searches to which the resulting symbol vector belongs are updated. For the constraint check at node 
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the set of tree-searches whose solution can be affected by the corresponding node is identified as r(su^), 
and pruning is performed if the corresponding PD is larger than all possible r^. ^ € T(sl'^), namely 

d(s«)> max r2 (20) 

Minor modifications are needed to this SD in order to take advantage of the proposed approximate 
information flow. According to those, the proposed SD may perform: 

1) Selective Soft Information Update (SU): As already discussed, the tree searches related to the 
RWC bits (see (5)) can be skipped. This can be efficiently achieved by zeroing the r^. ^ related to the 
corresponding bits. Then, since the zeroed values are of large the constraint of (23) becomes tighter 
and significant complexity reduction is achieved, as it is also shown in Section IV. 

2) Performance-Driven Soft Information Clipping (PDC): The basic idea behind the proposed performance- 
driven LLR clipping is to restrict the SD processing by accurately calculating the LLR values only up 

to the value where the convergence and the required TER are preserved. In this context it would be 
rational to assume that for the bits which already meet the TER constraint before channel decoding (i.e., 
I-^D (ca;)| > Lter) the average performance after decoding will be even better. So, reaching the TER 
before decoding is an indication that further processing may not be required. However, relying only on 
this assumption to perform LLR clipping is not efficient since this assumption is only valid for the average 
performance and not for each bit. Additionally, performing clipping based on Ld could (erroneously) 
result in small La (cfc) values which would significantly affect the outcome of the channel decoder 
(see discussion in Section II.B). Therefore, additional consideration should be given to the extrinsic 
information LLR values of the SD (which is the a-priori information for the SISO channel decoder) in 
order to preserve the system's convergence. 

When Le (cfc) is of same sign as Ld (cfc) it is an indication that the iterative process moves towards 
increasing receiver's confidence on the specific (decoded) bit [6|. If, in addition, this bit meets the 
TER constraint before SISO channel decoding (i.e., {Ld (cfc)| > Lter) and the corresponding extrinsic 
information is also strong (i.e., \Le {ck)\ > Lter) it can be roughly assumed that the decoding procedure 
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is mature enough so that the channel decoder will not decrease receiver's confidence for the specific 
bit in future iterations (i.e., during subsequent iterations will be sign {La {ck}) = sign{Lri {ck)) = 
sign{LE{ck))). In this case, and since the approximate LLR estimation of the strong LLR values is 
not expected to significantly affect the outcome of the decoder (see Section n.B) accurate calculation of 
the LLR values resulting in \Le {ck)\ = La [ck = tt"^ (ck)) > Lter is not required. On the contrary, 
if the sign{LE (cfc)} sign{Lr) (cfc)} (i.e., the receiver's confidence for the candidate decoded bit is 
not increasing) it is an indication that the iterative decoding process is not yet mature, so LLR clipping 
should be avoided in order to preserve the convergence properties. It is significant to note that if LLR 
clipping is (erroneously) performed on a bit which converges opposite to the finally decoded bit (i.e., 
sign {LEipk)) = sign {Loick)) ^ Ckjinai) it is not expected to negatively affect the performance since 
cUpping practically bounds the effects of this erroneous convergence. This is also verified in Section IV. 

According to the previous discussion, the SD search hypersphere should be reduced in a way that 
both the convergence and the TER performance after the SD (and before the channel decoder) are 
preserved. Equivalently, LLR approximation is allowed only when both the LE{ck) and the Lr>{ck) 
values are larger than Lter- As already discussed, LLR cUpping approaches which account only for the 
performance before decoding can result in small La (cfc) values which would consequently affect the 
outcome of the channel decoder and therefore the resulting performance. In addition, as it will be show 
in the sequel, clipping approaches targeting only |Le (cjt)| < Lter result in performance degradation. 

From (14), it follows that 



^MAP _ xMAP^ ^MAP ^ ^ig^ ^ +1 

(21) 



,MAP _ ^MAP^ ^MAP ^ sign{LD (cfc)} 



with \^^^ = min {/(su)} being the minimum / (s„) value found during the corresponding unconstraint 



single-tree search, c^"^^ the fc-th bit value of the symbol vector providing \^^^, and = 



min {/(su)} where S¥^^ are the sub-sets of possible symbols sequences having their k-th 

bit value opposite to the one of the MAP solution. Therefore, the search space for any of the trees can 
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be reduced to a hypersphere of 

rlDC,±,k = A*"^^ + \La (cfe)l + Iter + ^ (^f''^ - sign {La (cfe)}) La (a) . (22) 

where X'^^p is the minimum / (s^) value akeady found and Cjf^^ is the related fe-th bit value. Then, 
the corresponding pruning constraint can become 

D(sliA> max min{r|fe,r|,^C,±,fe}- ^^^^ 
Any time a new candidate X^^^ is found the corresponding ^ values can be updated to ^ •<— 
max{r^ ^, rpj^^ _|_ This is preformed in order to produce a chpped LLR value even if no solution of 
(5) lies in the search hypershere for the corresponfing bit. Both selective LLR update and hypersphere 
reduction result in a tighter constraint check than the typical (see (20)) and thus, in reduced SD processing. 

This search space reduction results in bounded \L£) (cfc)| values. From (21) and (22) it can be easily 
deduced that for X^^^ = X^^^ the corresponding i^l^^^'^'")'"" ^^^-j values which maximize |L£) (cfe)|. 



are 



sign {Ld (cfe)} Lter, sign {La (c^)} = sign {Ld (c^)} 

sign {Lb (cfe)} (Lter + \La (cjfc)|) , else 
From the above equation it becomes apparent that when the extrinsic information of the SD is of the same 

sign as its a-posteriori information the clipping value is such that no processing is spent for calculating 

values which exceed the TER constraint. For example if sign {La (ck)} = sign{LD (ck)} = 1, the 

maximum Le (ck) value equals Lter- In addition, the proposed clipping preserves the abihty of the 

bits to reach the TER before decoding. For example if sign {La (ck)} sign{LD (ck)} = 1 then 

The fact that the clipping process employs the c^^^ estimates instead of the exact c^^^ may 
sometimes lead to tighter LLR clipping than wanted. However, in Section IV it is shown that this does 
not have any considerable effect in scheme's performance. 
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3) Simplified Performance-Driven Soft Information Clipping (sPDC): A simplified PD-PDC can be 
acquired when it is not of interest to preserve the TER performance before channel decoding, namely, 
when clipping is allowed for bits with {Ld {ck)\ < Lter ■ Then, the search hypersphere can be reduced 
to 

rsPDC,±,k = ^^^'^"^ + \La (cfc)l + Lter + (cf^^ - sign {La (cfc)}) La (ck) • (25) 

Equivalently to (24), l'^°('"=)'™- (c^) = sign {Ld (cfc)} Lter even for sign {La (cfc)} 7^ sign {Ld (cfc)}. 
Therefore, for the previous example ^l^^^'^*)!"- (c^) = La (cfc) + Lter = Lter - \La {ck)\ < Lter- 
However, as it is shown in Section IV, not preserving the TER before channel decoding results in a 
noticeable performance degradation without providing any significant complexity gain. 

4) Decoder-Aware Performance-Driven Soft Information Clipping (DA-PDC): Tighter LLR clipping 
than the one of the PDC can be performed by making some further (approximate) assumptions on the 
"expected" reliability (i.e., LLR amplitude) increase provided by the SISO channel decoder. In detail, if 
after the SD processing the sign of the demapped bit is sustained (i.e., sign {La (cfc)) = sign {Le (cfc))) 
it means that the iterative process increases its confidence for this (hard) decoded bit. Then, it is 
approximately assumed that the sign of the decoder's extrinsic information (which will be the SD's 
a-priori information) will remain constant, and the magnitude will be at least the same. This can be 
typically observed when the iterative process is close to the state of constant information flow where the 
most significantly contributing sequences in (15) remain the same and the related a-priori information 
has already reached its constant flow state or still increases ||6l. Under this assumptions, the search 
hypersphere can be further tightened to accurately calculate only the Le (ck) values of those bits which 
cannot reach the TER performance even after SISO channel decoding. In detail, the search space can be 
reduced to a hypersphere of 

rhA-PDC,±,k = A*'^'' + Lter + \ (cf - szgn {La (c,)}) (cf \La {c,)\ + La (c,)) (26) 
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with the corresponding constraint to become 

^(s«^)> max min|r±fe,ri,^_PBC',±,fc|- (27) 
Then, it can be easily shown that the (cjt) values which maximize the |L£) (cjk)| value, are 



sign {Ld (cfc)} Lter - La (cfc) , sign {La (cfc)} = sign {Ld (cfc)} 

(28) 



sign {Ld (cfe)} (J^ter + \La (cjt)|) , else 
From (28) it becomes apparent that with such a hypersphere reduction, when sign {La (ck)} = sign{LD (cjt)}, 

only values which are not expected to reach TER after decoding are accurately calculated while, if 

sign {La {ck)} 7^ sign{LD{ck)}, extrinsic information values up to Lter are accurately calculated 

similarly to the PIDC. For example if sign {La (cfe)} = sign {Ld (cfe)} = 1 the corresponding maximum 

soft information input to the SISO channel decoder for the bit Ck = tt~^ (cfe) will be La (cjt) = 

Lter — (cfc) = Lter — Le (cfc). Therefore, if the LE{ck) of the current iteration is at least 

equal to the previous, LD{ck) will meet the TER requirement after decoding. 

For the bits whose LLR value is of such a high magnitude that a solution of (5) does not lie in 
this shrunken hypershere, clipping is performed according to the PIDC by updating ^ as r^. 
max{r^ k:i^PDC±k} ^ ^^"^ candidate X^^^ is found. Then, if the a-posteriori information of 

the bit does not belong in the shrunken hypersphere its LLR value will be set to such a value that its 
extrinsic information reaches Lter, similarly to PIDC. 

According to this last approximation, a bit with loose extrinsic and strong a-priori information (and 
sign {La (ck)} = sign{LD (cfe)}) may be erroneously assumed to have reached the TER. However, if 
this wrong assumption is critical for the convergence of its neighboring A bits, it will be manifested as a 
more loose extrinsic information at the channel decoder output (or more loose SD a-priori information) 
over the next iteration, similarly to what has been discussed in Section III.B. Subsequently, this will 
result in an increase of the search hypersphere during the next iteration and, thus, in more accurate LLR 
estimation. 
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5) Simplified Decoder-Aware Performance-Driven Soft Information Clipping (sDA-PDC): Similarly to 
the sPDC, when the prevention of the TER performance before decoding is not targeted, the hypersphere 
can be reduced to r'^sDA-PDC,±,k = >^^'^^+Lter, resulting in L^"^''" (cfc) = sign {Ld (ck)} Lter- 
La (cfc). In such a case the r j_ ^ max{r^ k-''^1pDC ± k\ update is performed any time a new candidate 
j^MAP found. Similarly to the sPDC, and as shown in Section V, this approach does not provide 
any significant complexity gain compared to the DA-PDC but it results in a noticeable performance 
degradation. 

The discussed LLR clipping approaches are selected in a way that the necessity of accurately calculating 
both the \Le\ and \Lr)\ values up to Lter is revealed. However, the proposed manifestations are not 
unique and several alternatives of similar complexity can be found, which still meet the same criteria but 
in a less tight way. For example, it can be easily verified that similarly to the PDC, a reduced hypersphere 
of ^ = A^^^^ + \La (cfc)l + Lter would result in clipped values only after both Ljj (c^) and Le (c^) 
meet the TER, but it would allow a larger (c^) when sign {La (ck)} / sign {Ld (ck)}- This, 

can be shown by simulations, to result only in an incremental increase in the number of visited nodes. 

C. Scenario-Adaptive SISO Channel Decoder 

As already discussed in Section III. A, step 3, the proposed scenario-adaptive SISO channel decoder 
performs decoding only on a subset of LLR values. Typical SISO channel decoder realizations operate 
in the log domain and employ the max* function in order to replace the computationally expensive 
multiplications with additions as described in ||8]. Then, as it is shown in |[T9l . |[20l . the most expensive 
operations become the necessary, energy consuming, memory accesses and especially the ones related 
to the state metric storages. The significance of reducing those memory accesses is emphasized in [19| 
where additional processing and register file storage is paid for this reason. However, even with such 
approaches, the number of accesses cannot be substantially reduced due to the energy overhead of the 
processing and the register file storage. In the sequel, equivalently to ||2TI . it is discussed how the selective 
LLR update of the non-RWC bits may result in reduced number of state metric storages. However, it is 

January 19, 2013 DRAFT 



THE FINAL VERSION OF THIS PAPER APPEARS IN IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 20 

significant to note tliat this discussion is just indicative since the selective updates cannot be quantified 
into energy savings without considering a specific implementation, which is beyond the scope of this 
work. 

For a convolutional code of 1/2 rate and with Cx,t{e) the encoder output bits for a transition e from 
the state s to s' at coding time t (with s,s'-0, ..,Ns — 1 and x = 0,1) the corresponding Lo {cx,t) can 
be expressed as HI 

LD{cx,t)= max *[5t{e)]- max * [5t{e)\ (29) 

e:cjx,t=l e:c^t=—l 

with 

6tie) = at-i[s] + cqMLa (co,t(e)) + hM^A (ci,t(e)) + (30) 
and at, Pt being the state metrics obtained through the following forward and backward recursions 

at{w)= max* at_i(s) + co,t(e)ZA (co,t(e)) + ci,t(e)LA (ci,t(e)) (31) 

e:s'=w L J 

/3t{w) = max* /3j+i(s') + co,i+i(e)Lyi (co,t+i(e)) + ci,j+i(e)LA (ci,t+i(e)) . (32) 

e:s=w L . 

As discussed in ||20| . the at{w) values can be calculated and overwritten immediately as they are not 
required in future calculations. On the other hand, typically, all (3t{w) metrics need to be stored. However, 
for selective (per bit) channel decoding only the subset of Ptiw) values related to the decoded bits needs 
to be stored, resulting in potential energy consumption savings. 

D. Complexity Issues 

In this subsection, some complexity issues are discussed without considering the early-stopping control, 
since it is not required from the proposed scheme and as it can be replaced by other similar early-stopping 
approaches. 

An additional memory of K bits is required for storing the flag sequence G^'^'> (k). The additional 
required interleaving effort for the flag sequence is a small portion of the overall interleaving one, since 
the introduced one bit overhead is typically small compared to the number of bits employed to represent 
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the extrinsic LLR values in fixed point arithmetics. De-interleaving is not required since the flag sequence 
does not change within iterations. An additional complexity increase of 2K real number comparisons 
is introduced for RWC bit identification. The one-bit comparisons which are required to identify the 
position of the RWC bits at the SD and SISO outer decoder side are assumed negligible compared to 
the real ones. 

In order to assess the SD complexity gains via simulations in Section IV the number of the visited 
nodes is employed as an indicative measure. However, similar results hold for other measures as the 
number of expanded nodes or the number of required partial distance calculations. 

As already discussed, the energy savings at the SISO channel decoder side can be quantified only 
for specific implementations. However, since the main potential gain is expected to originate from the 
minimization of the memory accesses, two measures are indicatively considered related to the different 
types of memory accesses. The first one is the number of the non-RCW bits which is related to the number 
of accesses required for the chaimel decoder's a-priori information update. The other is the number of 
the required /3 = [/?t(0), ...,/?t(A^s — 1)]^ calculations which is related to the state metric storages. 

IV. Simulations 

A 4 X 4 MIMO system is assumed operating over a spatially and temporally uncorrelated Rayleigh 
flat-fading channel. The encoded bits are mapped onto 16-QAM via Gray coding. A systematic (5/7)8 
recursive convolutional code of rate 1/2 is employed with code block of 18432 bits. The log-MAP BCJR 
algorithm has been employed for SISO channel decoding. Early stopping control of error rate equal to 
the TER is always assumed (even with the typical SD). 

In Fig. 2 the BER performance of the proposed scheme is depicted for an SNR of 7 dB. Three proposed 
SD approaches are compared to the typical SD. These are the SU, the SU & PDC, and the SU & DA-PDC. 
The TER is set to 2 ■ 10"^, shghtly lower than the best achievable BER (ps 2.2 • 10"^). Selective SISO 
channel decoding is employed with u; = 1. It is shown that the proposed RWC identification methodology 
is that rehable where negligible performance loss is observed even for selective SISO channel decoding 
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of minimum u; = 1. In Fig. 3 the (cumulative over iterations) complexity of the corresponding SD 
approaches is shown, while in Fig. 4 the (cumulative) required (3 stores and the number of the non-RWC 
bits are depicted. It is shown that, at iteration 5, the SU provides an SD complexity gain of about 28%. 
The SU & PDC approach provides a complexity gain of about 71% compared to the SU approach, while 
the SU & DA-PDC SD provides an additional complexity gain of about 25% compared to the SU & 
PDC one, with the total complexity gain, compared to the typical, reaching 84%. At the same time, for 
the SU & DA-PDC soft demapper, the gain related to the P stores and the number of the non-RWC bits 
reaches the 41% and the 46% accordingly. It is also shown that the corresponding store requirements are 
shghtly dependent on the SD approach and only at high iterations. This is an expected behavior since a 
"good" soft-information approximation should not affect the process towards bit convergence, but only 
the point where the convergence stops. 

In Figs. 5 and 6 the efficiency of the proposed performance-driven chpping methods is depicted in terms 
of BER performance and SD complexity savings, when combined with SU. An SNR of 7 dB is assumed 
with a TER of 2 • lO'^ and full SISO channel decoding. It is shown that both the PDC and DA-PDC 
approaches allow reaching the TER performance with a negUgible performance loss (which is visible 
only for the DA-PDC). On the contrary, their simphfied versions result in non-negligible performance 
loss despite their slightly increased complexity (i.e., tighter clipping can result in delayed over iterations 
RWC bit identification). As shown, at iteration 5, the (s)DA-PDC approaches can provide an additional 
complexity gain of about 25% compared to the (s)PDC ones. 

In Figs. 7 and 8 the proposed approach is compared to the performance of a typical iterative scheme 
for the SNRs of 7 and 8 dB respectively and for several TER values. An SD with SU & DA-PDC and 
selective channel decoding ofw = l are employed. As shown in Figs. 9 and 10, at iteration 3 for example, 
a TER reduction of an an order of magnitude results in SD complexity gains of 30-42%. In addition, the 
overall SD complexity gain ranges from 82 to 90%. In Figs. 11, 12 it is shown that significant gains in 
the number of memory accesses can be achieved only over higher iterations where the number of RWC 
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bits is adequately high. For example, for 9 dB and iteration 3, gains of 33% and 38% are observed in 
the number of the (5 stores and the number of the non-RWC bits respectively, for a TER of 10~"^, while 
for a TER of 10~^ the convergence process stops earlier and the corresponding gains become 21% and 
26%. 

V. Conclusion 

An iterative receiver processing framework of approximate soft information exchange has been pro- 
posed which allows the adjustment of the receiver processing requirements (i.e., of the soft-output 
detector and of the SISO channel decoder) to the transmission conditions and the required BER. In 
this context, several performance-driven LLR clipping methods together with partial soft information 
update are proposed in order to adjust the complexity of the receiver to the target performance. Despite 
the small additional overhead the approach can provide substantial complexity savings both at the soft- 
output detector and the channel decoder. 
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Fig. 1. Block diagram of a typical iterative scheme (solid lines) with the proposed modification (dashed lines) for approximate 
soft information flow. 
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Fig. 2. BER performance for a system with selective channel decoding of w = 1 and soft demapping with SU, SU & PDC, 
and SU & DA-PDC at 7dB. 
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Fig. 3. Soft demapping complexity for a system with selective channel decoding of w = 1 and soft demapping with SU, SU 
& PDC, and SU & DA-PDC at 7dB. 
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Fig. 4. Memory store requirements for a system with selective channel decoding of u) = 1 and soft demapping with SU, SU 
& PDC, and SU & DA-PDC at 7dB. 
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Fig. 5. BER performance for a system with full channel decoding and soft demappers with SU and different clipping approaches 
at 7 dB. 
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Fig. 6. Soft demapping complexity for a system with full channel decoding and soft demappers with SU and different clipping 
approaches at 7 dB. 
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Fig. 7. BER performance for a system with selective channel decoding of w = 1, SU & DA-PDC soft demapping and several 
TER values at 7 dB. 




Fig. 8. BER performance for a system with selective channel decoding of w = 1, SU & DA-PDC soft demapping and several 
TER values at 9 dB. 
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Fig. 9. Soft demapping complexity for a system witli selective charmel decoding of ui = 1, SU & DA-PDC soft demapping 
and several TER values at 7 dB. 
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Fig. 10. Soft demapping complexity for a system with selective channel decoding of w = 1, SU & DA-PDC soft demapping 
and several TER values at 9 dB. 
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Fig. 11. Memory store requirement for a system with selective channel decoding of w 
and several TER values at 7 dB. 
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Fig. 12. Memory store requirement for a system with selective channel decoding of ui = 1, SU & DA-PDC soft demapping 
and several TER values at 9 dB. 
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